Es importante seleccionar el directorio, pero siempre lo hacemos desde Session -> Set Working Directory -> Choose Directory
#Seleccionamos el directorio de forma manualdirPath <-"C:/Users/USUARIO/Documents/GitHub/CC_Module3/wordcloud"#Lo cargamosspeech <-Corpus(DirSource(dirPath))
Chequeamos la estructura de nuestro text Corpus
inspect(speech)
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
churchill_speeches.txt
At present we lie within a few minutes<92> striking distance of the French, Dutch and Belgian coasts, and within a few hours of the great aerodromes of Central Europe. We are even within canon-shot of the Continent.\n\nSo close as that! Is it prudent, is it possible, however much we might desire it, to turn our backs upon Europe and ignore whatever may happen there? I have come to the conclusion <96> reluctantly I admit <96> that we cannot get away. Here we are and we must make the best of it. But do not underrate the risks <96> the grevious risks <96> we have to run.\n\n\nThis is only the beginning of the reckoning. This is only the first sip, the first foretaste of a bitter cup which will be proffered to us year by year unless, by a supreme recovery of moral health and martial vigour, we arise again and take our stand for freedom as in the olden time.\n\n\nI would say to the House, as I said to those who have joined this Government: I have nothing to offer but blood, toil, tears and sweat. We have before us an ordeal of the most grievous kind. We have before us many, many long months of struggle and of suffering. You ask, what is our policy? I can say: It is to wage war, by sea, land and air, with all our might and with all the strength that God can give us; to wage war against a monstrous tyranny, never surpassed in the dark, lamentable catalogue of human crime. This is our policy. You ask, what is our aim?\n\nI can answer in one word: It is victory, victory at all costs, victory in spite of all terror, victory, however long and hard the road may be, for without victory, there is no survival.\n\n\nEven though large tracts of Europe and many old and famous states have fallen or may fall into the grip of the Gestapo and all the odious apparatus of Nazi rule, we shall not flag or fail. We shall go on to the end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air, we shall defend our island, whatever the cost may be, we shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender.\n\n\nThe battle of France is over. I expect that the Battle of Britain is about to begin. Upon this battle depends the survival of Christian civilisation. Upon it depends our own British life, and the long continuity of our institutions and our Empire. The whole fury and might of the enemy must very soon be turned upon us. Hitler knows that he will have to break us in this island or lose the war.\n\nIf we can stand up to him, all Europe may be free and the life of the world may move forward into broad, sunlit uplands. But if we fail, then the whole world, including the United States, including all that we have known and cared for, will sink into the abyss of a new Dark Age made more sinister, and perhaps more protracted, by the lights of perverted science. Let us therefore brace ourselves to our duties, and so bear ourselves that, if the British Empire and its Commonwealth last for a thousand years, men will still say, <91>This was their finest hour.\n\n\nThe gratitude of every home in our Island, in our Empire, and indeed throughout the world, except in the abodes of the guilty, goes out to the British airmen who, undaunted by odds, unwearied in their constant challenge and mortal danger, are turning the tide of the World War by their prowess and by their devotion. Never in the field of human conflict was so much owed by so many to so few.\n\n\nFrom Stettin in the Baltic to Trieste in the Adriatic, an iron curtain has descended across the Continent. Behind that line lie all the capitals of the ancient states of Central and Eastern Europe. Warsaw, Berlin, Prague, Vienna, Budapest, Belgrade, Bucharest and Sofia, all these famous cities and the populations around them lie in what I must call the Soviet sphere.\n\n\nI am very glad that Mr Attlee described my speeches in the war as expressing the will not only of Parliament but of the whole nation. Their will was resolute and remorseless and, as it proved, unconquerable. It fell to me to express it, and if I found the right words you must remember that I have always earned my living by my pen and by my tongue. It was the nation and race dwelling all round the globe that had the lion heart. I had the luck to be called upon to give the roar.
Remove unnecessary whitespace with 'stripwhitespace'
speech <-tm_map(speech, stripWhitespace)
Paso 4: Term Document Matrix
Next step is to create a term document matrix, which is a table that contains the frequency of the words. We will use 'TermDocumentMatrix'
#Create a Term Document Matrixdtm <-TermDocumentMatrix(speech)#Matrix transformationm <-as.matrix(dtm)#Sort it to show the most frequent wordsv <-sort(rowSums(m), decreasing=TRUE)#transform to a data framed <-data.frame(word =names(v), freq=v)head(d,10)
word freq
shall shall 11
fight fight 7
may may 6
will will 6
europe europe 5
upon upon 5
victory victory 5
war war 5
can can 4
many many 4
Paso 5: Simple Word Cloud
wordcloud(words = d$word, freq = d$freq)
Paso 6: Frequency
You can also adjust the number of words by specifying the minimum frequency
#Select our datasetmtcars %>%#atribute rownames to a variableadd_rownames( var ="group") %>%#assign each varaible -- car names -- to their related variablesmutate(across(where(is.numeric), rescale)) %>%#select which data to plothead(3) %>%select(1:10) -> mtcars_radar
Warning: `add_rownames()` was deprecated in dplyr 1.0.0.
ℹ Please use `tibble::rownames_to_column()` instead.
#this code will generate lots of warning, so let's supress themoptions(warn=-1)ggradar(mtcars_radar)
Paso 4: Output
Si queremos mejorar la visualización, pero se ve igual. Así que no pasa nada.
#Debos instalar lo siguiente#devtools::install_github("IRkernel/IRkernel")#IRkernel::set_plot_options(width=950, height=600, units='px')#ggradar(mtcars_radar)
Waffle Charts
Waffle charts are a great way to visualize data in relation to a whole or to highlight progress against a given threshold.
Paso 1: Libraries
library(ggplot2)library(waffle)
Paso 2: Implementation in R
Firs, we need to create a name vector with the household spending data from before
To create our waffle chart, we will use the 'waffle' method
expenses/1235: se utiliza como factor de normalización. Se busca reducir las cifras de gasto en una escala adecuada para que el gráfico de waffle tenga un número de cuadrados visualmente representativo, sin que sea demasiado grande o pequeño.
A box plot summarizes the distribution of sorted numerical data.
The first quartile is the point 25% of the way through the sorted data.
In other words, a quarter of the data points are less than this value.
Similarly, 75% of the points are less than the third quartile value.
The interquartile range is simply the difference between the first and third quartile
The median is effectively the second quartile
The lower and upper whiskers indicate values outside the interquartile range
An example…
Paso 0: Example Data Frame
In order to reproduce the results, we are going to fix the seed value for the random number generator. So the data will appear random, but it will be the same very time the code is run.
Set A is sampled from the normal distribution with mean 1, standard deviation 2
Set B has mean 0, standard deviation 1
We will place these sets into a df. Separate them by label
#create the data framedf <-data.frame(label =factor(rep(c("A","B"), each=200)), value =c(set_a,set_b))
head(df)
label value
1 A -1.414131
2 A 1.554858
3 A 3.168882
4 A -3.691395
5 A 1.858249
6 A 2.012112
tail(df)
label value
395 B 0.52874502
396 B 0.78939440
397 B 0.45709951
398 B 0.53883312
399 B 0.01464312
400 B -0.91648914
Paso 0.1: Importing packages
library(ggplot2)library(plotly)
Adjuntando el paquete: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Paso 0.2: geom_boxplot()
ggplot(df, aes(x=label, y=value)) +geom_boxplot()
ggplotly()
Ahora exploremos con la data de mtcars…
Paso 1: Revisamos la base de datos
We are going to work with the first two variables in the top row: miles per gallon (mpg) and number of cylinders (cyl)
summary(mtcars)
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
Paso 2: Creating box plots using qplot()
La variable cyl representa más una variable categórica que numérica, con la cual se pueden agrupar el resto de valores.
qplot(factor(cyl), mpg, data = mtcars, geom ="boxplot")